Open Collaborative Development of the Thai Language Resources for Natural Language Processing
نویسندگان
چکیده
Language Resources are recognized as an essential component in linguistic infrastructure and a starting point of Natural Language Processing systems and applications. In this paper, we describe the achievement of the development and the use of Thai Language Resources germinated with an open collaboration platform, under the collaboration between research institutes. The resources include either text or speech. Text resources are divided into lexicon database and annotated corpus. We started developing a corpus-based Thai-English lexicon database (LEXiTRON) since 1994. It was originated from a dictionary designed for using in developing a machine translation system. Since then the Thai POS was designed and evaluated in several applications (word segmentation, machine translation, grapheme-to-phoneme, etc.) Extending the lexicon database, POS tagged corpus (ORCHID), and speech corpora for both synthesis and recognition are developed and functioned as an important part of research and development on NLP or HLT. These language resources are available for academic experiment.
منابع مشابه
Proceedings of the 9th Workshop on Asian Language Resources Collocated with Ijcnlp 2011 We Wish to Thank Our Sponsors
Language resources are really much required for understanding and modeling the language in the present approaches. The language that has a rich language resource gains a big benefit in making a big advance in language processing. On the other hand, the less resource language is struggling with preparing a large enough language resource such as raw text or annotated corpora. It is a labor intens...
متن کاملParticipation in Language Resource Development and Sharing
Language resources are really much required for understanding and modeling the language in the present approaches. The language that has a rich language resource gains a big benefit in making a big advance in language processing. On the other hand, the less resource language is struggling with preparing a large enough language resource such as raw text or annotated corpora. It is a labor intens...
متن کاملInteractional complexity development, interactional demonstrators and interaction density in collaborative and e-collaborative writing modalities
This study aimed at investigating the potential of collaborative and e-collaborative writing modalities in developing interactional complexity, utilization of interactional demonstrators and density of interaction. To this end, 66 Iranian intermediate female English as foreign language learners (EFL) were selected to participate in this study according to their scores on Oxford Placement Test (...
متن کاملFrom Importer of Knowledge to Researcher of the Self: Exploring the Utility of Collaborative Action Research in Distance Second Language Professional Development
Teacher professional development, as a burgeoning term, has attracted a surge of interest in English language teaching. In second language (L2) professional development, the common orthodoxy has been one which considers teachers as that of knowledge consumers. It is commonly argued that top-down approaches to teachers’ professional development has done little to maximize teachers’ professionali...
متن کاملThe Influence of Collaboration on Individual Writing Quality: The Case of Iranian vs. Malaysian College Students
This study purported to comparatively investigate the influence of collaborative writing on the quality of individual writing of four female Iranian and four female Malaysian students. The first semester students at a private university in Malaysia, who were comparable in terms of age, gender, study discipline, and language proficiency, were divided into two Iranian and two Malaysian dyads. The...
متن کامل